forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Gpu known subgroup size #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
FMarno
wants to merge
119
commits into
main
Choose a base branch
from
gpu_known_subgroup_size
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This manifested as an assertion failure in Clang built against libc++ with hardening enabled (e.g. -D_LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_DEBUG): `libcxx/include/__memory/unique_ptr.h:596: assertion __checker_.__in_bounds(std::__to_address(__ptr_), __i) failed: unique_ptr<T[]>::operator[](index): index out of range`
After 7f74651, the pointer operand may be replicated of a PtrAdd. Instead of requesting a single scalar, request lane 0, which correctly handles the case when there is a scalar-per-lane. Fixes llvm#111606.
This commit adds the ViewLikeOpInterface to the GEP and AddrSpaceCast operations. This allows us to simplify the inliner interface. At the same time, the change also makes the inliner interface more extensible for downstream users that have custom view-like operations.
…aders for LinalgDialect (llvm#111603) This fixes non-deterministic build failures. Fixes llvm#111527 --------- Co-authored-by: zecheng.zhang <[email protected]> Co-authored-by: Mehdi Amini <[email protected]>
llvm#111451) We add static methods to APFloatBase to allow the hasZero and hasSignedRepr properties of fltSemantics to be obtained.
) On Apple platforms, using system-libcxxabi as an ABI library wouldn't work because we'd try to re-export symbols from libc++abi that the system libc++abi.dylib might not have. Instead, only re-export those symbols when we're using the in-tree libc++abi. This does mean that libc++.dylib won't re-export any libc++abi symbols when building against the system libc++abi, which could be fixed in various ways. However, the best solution really depends on the intended use case, so this patch doesn't try to solve that problem. As a drive-by, also improve the diagnostic message when the user forgets to set the LIBCXX_CXX_ABI_INCLUDE_PATHS variable, which would previously lead to a confusing error. Closes llvm#104672
…11533) This patch implements speculation for vector.transfer_read/vector.transfer_write ops, allowing these ops to work with LICM.
Add the permutation clause for the interchange directive which will be introduced in the upcoming OpenMP 6.0 specification. A preview has been published in [Technical Report12](https://www.openmp.org/wp-content/uploads/openmp-TR12.pdf).
…ls. (llvm#109708) Make legacy cost retrieval independent of getInstructionForCost by sinking it to more specific ::computeCost implementation (specifically VPInterleaveRecipe::computeCost and VPSingleDefRecipe::computeCost). Inline getInstructionForCost to VPRecipeBase::cost(), as it is now only used to decide which recipes to skip during cost computation and when to apply forced costs. PR: llvm#109708
…t(pcmpeq(and(X,Pow2),Pow2),B,A) Matches what we already do in LowerVSETCC to reuse an existing constant Fixes llvm#110875
…lvm#111600) The `SymbolTableListTraits` template is explicitly instantiated for the following types: * `llvm/lib/IR/Function.cpp` - `BasicBlock` * `llvm/lib/IR/Module.cpp` - `Function` - `GlobalAlias` - `GlobalIFunc` - `GlobalVariable` When LLVM is built on Windows with the `LLVM_EXPORT_SYMBOLS_FOR_PLUGINS` option enabled, the implicit instantiation of the template prevents the `SymbolTableListTraits` template from being exported. This causes link errors when the template or IR API is used in a plugin. This change prevents the template being implicitly instantiated for these types.
…1540) Currently this test is completely xfailed as part of the patch llvm#106077. But this test works on A and R profile, not in v7M profile. Because the test contain cases in which m-profile will fail for atomic types greater than 4 bytes in size.
This was using addrspace 0 and 1 pointers interchangably. This works out since they happen to use the same size, but consistently query or use the correct one.
…#111541 into a canonicalization (llvm#111614) This is a reasonable canonicalization because `extract` is more constrained than `extract_strided_slices`, so there is no loss of semantics here, just lifting an op to a special-case higher/constrained op. And the additional `shape_cast` is merely adding leading unit dims to match the original result type. Context: discussion on llvm#111541. I wasn't sure how this would turn out, but in the process of writing this PR, I discovered at least 2 bugs in the pattern introduced in llvm#111541, which shows the value of shared canonicalization patterns which are exercised on a high number of testcases. --------- Signed-off-by: Benoit Jacob <[email protected]>
With some restrictions, BIND(C) derived types can be converted to compatible BIND(C) derived types. Semantics already support this, but ConvertOp was missing the conversion of such types. Fixes llvm#107783
These should be well behaved address computations.
…oops (llvm#111656) Properly handles `cycle` branching inside target distribute loops.
Same logic as other callsites, if the attributes are intersectable, we merge. Closes llvm#111713
…#111759) These were split in 0e8208e, with the only functional difference between them at the time being `--prepend_env PATH=%{lib-dir}` in the static config and `--prepend_env PATH=%{install-prefix}/bin` in the shared library config. However this difference is unnecessary - the static library config doesn't need any `--prepend_env` argument at all. Before 0e8208e, both configurations used the same config file, where the `--prepend_env` argument was unnecessary but benign in the static case. Reduce the unnecessary config duplication in this case, and return these configs to using one single config file for both setups.
FMINNM/FMAXNM instructions of AArch64 follow IEEE754-2008. We can use them to canonicalize a floating point number. And FMINNUM_IEEE/FMAXNUM_IEEE is used by something like expanding FMINIMUMNUM/FMAXIMUMNUM, so let's define them. Update combine_andor_with_cmps.ll. Add fp-maximumnum-minimumnum.ll, with nnan testcases only. V1F64 is not supported yet. If we set v1f64 as legal, FMINNUM/FMAXNUM will have some problem: both of them use `if (isOperationLegalOrCustom(FMAXNUM_IEEE, VT))`. AArch64 depends on `expandFMINNUM_FMAXNUM` returning `SDValue()` for FMAXNUM and FMINNUM. We should fix this problem, while it will be in future patch.
This finishes the clang implementation of P0522, getting rid of the fallback to the old, pre-P0522 rules. Before this patch, when partial ordering template template parameters, we would perform, in order: * If the old rules would match, we would accept it. Otherwise, don't generate diagnostics yet. * If the new rules would match, just accept it. Otherwise, don't generate any diagnostics yet again. * Apply the old rules again, this time with diagnostics. This situation was far from ideal, as we would sometimes: * Accept some things we shouldn't. * Reject some things we shouldn't. * Only diagnose rejection in terms of the old rules. With this patch, we apply the P0522 rules throughout. This needed to extend template argument deduction in order to accept the historial rule for TTP matching pack parameter to non-pack arguments. This change also makes us accept some combinations of historical and P0522 allowances we wouldn't before. It also fixes a bunch of bugs that were documented in the test suite, which I am not sure there are issues already created for them. This causes a lot of changes to the way these failures are diagnosed, with related test suite churn. The problem here is that the old rules were very simple and non-recursive, making it easy to provide customized diagnostics, and to keep them consistent with each other. The new rules are a lot more complex and rely on template argument deduction, substitutions, and they are recursive. The approach taken here is to mostly rely on existing diagnostics, and create a new instantiation context that keeps track of this context. So for example when a substitution failure occurs, we use the error produced there unmodified, and just attach notes to it explaining that it occurred in the context of partial ordering this template argument against that template parameter. This diverges from the old diagnostics, which would lead with an error pointing to the template argument, explain the problem in subsequent notes, and produce a final note pointing to the parameter.
Summary: Option `-fskip-odr-check-in-gmf` is set by default and I think it is what most of C++ developers want. But in header units, Clang ODR checking is too strict, making them hard to use, as seen in the example in the diff. This diff relaxes ODR checks for unnamed modules to match GMF ODR checking. Test Plan: check-clang
…07350) With this change, we discriminate if the primary template and which partial specializations would have participated in overload resolution prior to P0522 changes. We collect those in an initial set. If this set is not empty, or the primary template would have matched, we proceed with this set as the candidates for overload resolution. Otherwise, we build a new overload set with everything else, and proceed as usual.
…te calls. (llvm#111457) Clang previously missed implementing P0522 pack matching for deduced function template calls. Fixes llvm#111363
Extra builders for CallIntrinsicOp. This is inspired by the comment from @antiagainst from [here](llvm#108933 (comment)).
…t attributes and undeclared templates (llvm#107786) Fixes llvm#107047 Fixes llvm#49093
…11679) This DAG combine replaces a floating-point load/store pair which has no other uses with an integer one, but did not copy the memory operand flags to the new instructions, resulting in it dropping the volatile flag. This optimisation is still valid if one or both of the instructions is volatile, so we can copy over the whole MachineMemOperand to generate volatile integer loads and stores where needed.
These might also be called with vectors, but we don't support that.
This does a global rename from `flang-new` to `flang`. I also removed/changed any TODOs that I found related to making this change. --------- Co-authored-by: H. Vetinari <[email protected]> Co-authored-by: Andrzej Warzynski <[email protected]>
llvm#111797) This commit fixes a bug in the import of nameless globals. Before this change, the fake symbol names were only generated during the transformation of the definition. This caused issues when the symbol was used before it was defined.
…e. (llvm#111428) Similar to 112aac4, this converts log libcalls to llvm.log.f64 intrinsics if we know they do not set errno, as the input is not zero and not negative. As log will produce errno if the input is 0 (returning -inf) or if the input is negative (returning nan), we also perform the conversion when we have noinf and nonan.
follow up work of llvm#106229, add create pass overload function to create pass. --------- Co-authored-by: jingzec <[email protected]>
- Add handling for unsigned integers to hlsl_elementwise_sign - Use `select` instead of adding dx and spirv intrinsics for unsigned integers as [discussed previously ](llvm#101988 (comment)) fixes llvm#70078 ### Related PRs - llvm#101987 - llvm#101988 - llvm#101989 cc @farzonl @pow2clk @bob80905 @bogner @llvm-beanz
Saves me searching for this every time someone asks.
…XTRACT_SUBVECTOR(V,C1+C2) (llvm#111685) Extract from the original source vector whenever possible. This removes a number of dependency bottlenecks and helps a number of shuffle combining cases: either by allowing us to avoid a cross-lane variable shuffle on a slow target by keeping the instruction count below the threshold, or on fast targets make it easier to recognise that the subvectors all came form the same source.
…m#111747) This module is used in various helper scripts since llvm#93712
…lvm#111720) Fixes missing m0 initialize for pre-gfx9 targets with local extending loads.
Implement the addMachineSSAOptimizations passes for AMDGPU. Porting the other generic passes in this category is WIP.
Run ArgumentPromotion before IPSCCP in the LTO pipeline, to expose more constants to be propagated. We also run PostOrderFunctionAttrs to improve the information available to ArgumentPromotion's alias analysis, and SROA to clean up allocas.
FMarno
commented
Oct 17, 2024
|
|
||
| static std::optional<uint32_t> | ||
| getIntelReqdSubGroupSize(FunctionOpInterface func) { | ||
| constexpr llvm::StringLiteral discardableIntelReqdSubgroupSize = |
Owner
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good if we could get this from a function like a IntelReqdSubgroupSizeAttrName function
| // CHECK-SAME: %[[I32_VAL:.*]]: i32, %[[I64_VAL:.*]]: i64, | ||
| // CHECK-SAME: %[[F16_VAL:.*]]: f16, %[[F32_VAL:.*]]: f32, | ||
| // CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32) { | ||
| // CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32) |
Owner
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change
| // CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32) | |
| // CHECK-SAME: %[[F64_VAL:.*]]: f64, %[[OFFSET:.*]]: i32) attributes {gpu.known_subgroup_size = 16 : i32} { |
include the attribute in the check
Also use it for lowering in GPUToLLVMSPV
…_size In the GPU To LLVM SPV patterns
65542dd to
31fd327
Compare
FMarno
pushed a commit
that referenced
this pull request
Oct 22, 2024
…en issue Since llvm#109628 landed, this test has been failing on 32-bit Arm. This is due to a codegen problem (whether added or uncovered by the change, not known) where the trap instruction is placed after the frame pointer and link register are restored. llvm#113154 So the code was: ``` std::__1::vector<int>::operator[](unsigned int): sub sp, sp, llvm#8 str r0, [sp, #4] str r1, [sp] add sp, sp, llvm#8 .inst 0xe7ffdefe bx lr ``` When lldb saw the trap, the PC was inside operator[] but the frame information actually pointed to g. This bug only happens for leaf functions so adding a return type works around it: ``` std::__1::vector<int>::operator[](unsigned int): push {r11, lr} mov r11, sp sub sp, sp, llvm#8 str r0, [sp, #4] str r1, [sp] mov sp, r11 pop {r11, lr} .inst 0xe7ffdefe bx lr ``` (and operator[] should return T& anyway) Now the PC location and frame information should match and the test passes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.